Overview

Dataset info

Number of variables21
Number of observations946820
Missing cells18 (< 0.1%)
Duplicate rows32964 (3.5%)
Total size in memory151.7 MiB
Average record size in memory168.0 B

Variables types

Numeric9
Categorical2
Boolean8
Date0
URL0
Text (Unique)0
Rejected2
Unsupported0

Warnings

Dataset has 32964 (3.5%) duplicate rows Warning
amount has 10740 (1.1%) zeros Zeros
domain1 has a high cardinality: 9810 distinct values Warning
field1 has 101470 (10.7%) zeros Zeros
field5 has 586120 (61.9%) zeros Zeros
flag5 is highly skewed (γ1 = 25.44446999) Skewed
hour1 has 20020 (2.1%) zeros Zeros
hour2 is highly correlated with hour1 (ρ = 0.9947080158) Rejected
state1 has a high cardinality: 53 distinct values Warning
total is highly correlated with amount (ρ = 0.9994216775) Rejected

Variables

amount
Numeric

Distinct count88
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean25.6353121
Minimum0
Maximum95.4
Zeros (%)1.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile10.36
Q112.95
Median25.9
Q338.85
95-th percentile49.95
Maximum95.4
Range95.4
Interquartile range25.9

Descriptive statistics

Standard deviation14.19041741
Coef of variation0.5535496253
Kurtosis-1.154029748
Mean25.6353121
MAD13.17173095
Skewness0.3167990242
Sum24272026.2
Variance201.3679462
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 4.75 9.605 9.995 10.18 ... 64.875 78.135 84.95 92.7 95.4 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
12.95 339070 35.8%
 
38.85 305980 32.3%
 
10.36 72450 7.7%
 
49.95 61200 6.5%
 
25.9 49670 5.2%
 
31.08 34080 3.6%
 
11.01 26970 2.8%
 
0 10740 1.1%
 
20.72 9940 1.0%
 
59.4 7430 0.8%
 
Other values (78) 29290 3.1%
 

Minimum 5 values

ValueCountFrequency (%) 
0 10740 1.1%
 
9.5 120 < 0.1%
 
9.71 220 < 0.1%
 
9.99 100 < 0.1%
 
10 20 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
95.4 290 < 0.1%
 
90 70 < 0.1%
 
89.95 90 < 0.1%
 
79.95 60 < 0.1%
 
76.32 10 < 0.1%
 

df_index
Numeric

Distinct count85214
Unique (%)9.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean38819.12
Minimum0
Maximum85213
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile2367
Q114202
Median37872
Q361543
95-th percentile80479
Maximum85213
Range85213
Interquartile range47341

Descriptive statistics

Standard deviation25970.05113
Coef of variation0.6690015416
Kurtosis-1.284262532
Mean38819.12
MAD22733.1482
Skewness0.1256088471
Sum3.67547192e+10
Variance674443555.5
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 0. 9467.5 85213. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 20 < 0.1%
 
4396 20 < 0.1%
 
7981 20 < 0.1%
 
5932 20 < 0.1%
 
3375 20 < 0.1%
 
1326 20 < 0.1%
 
7469 20 < 0.1%
 
5420 20 < 0.1%
 
2863 20 < 0.1%
 
814 20 < 0.1%
 
Other values (85204) 946620 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
0 20 < 0.1%
 
1 20 < 0.1%
 
2 20 < 0.1%
 
3 20 < 0.1%
 
4 20 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
85213 8 < 0.1%
 
85212 10 < 0.1%
 
85211 10 < 0.1%
 
85210 10 < 0.1%
 
85209 10 < 0.1%
 

domain1
Categorical

Distinct count9810
Unique (%)1.0%
Missing (%)< 0.1%
Missing (n)10
AOL.COM
164510
YAHOO.COM
158140
HOTMAIL.COM
115440
Other values (9806)
508720
ValueCountFrequency (%) 
AOL.COM 164510 17.4%
 
YAHOO.COM 158140 16.7%
 
HOTMAIL.COM 115440 12.2%
 
MSN.COM 40290 4.3%
 
COMCAST.NET 39180 4.1%
 
SBCGLOBAL.NET 24980 2.6%
 
COX.NET 21640 2.3%
 
EARTHLINK.NET 20890 2.2%
 
BELLSOUTH.NET 16990 1.8%
 
VERIZON.NET 10250 1.1%
 
Other values (9799) 334500 35.3%
 
Max length38
Mean length10.25557128
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsTrue

field1
Numeric

Distinct count5
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean2.419203228
Minimum0
Maximum4
Zeros (%)10.7%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q12
Median3
Q33
95-th percentile3
Maximum4
Range4
Interquartile range1

Descriptive statistics

Standard deviation1.003755571
Coef of variation0.414911637
Kurtosis1.012088836
Mean2.419203228
MAD0.7940265171
Skewness-1.3232963
Sum2290550
Variance1.007525247
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
3 550020 58.1%
 
2 238030 25.1%
 
0 101470 10.7%
 
4 35710 3.8%
 
1 21590 2.3%
 

Minimum 5 values

ValueCountFrequency (%) 
0 101470 10.7%
 
1 21590 2.3%
 
2 238030 25.1%
 
3 550020 58.1%
 
4 35710 3.8%
 

Maximum 5 values

ValueCountFrequency (%) 
4 35710 3.8%
 
3 550020 58.1%
 
2 238030 25.1%
 
1 21590 2.3%
 
0 101470 10.7%
 

field2
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
543890
1
402930
ValueCountFrequency (%) 
0 543890 57.4%
 
1 402930 42.6%
 

field3
Numeric

Distinct count15786
Unique (%)1.7%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean714.5126529
Minimum-32265
Maximum8193
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum-32265
5-th percentile-6572
Q1-1551
Median1455
Q33598
95-th percentile5859
Maximum8193
Range40458
Interquartile range5149

Descriptive statistics

Standard deviation3919.306449
Coef of variation5.485286276
Kurtosis1.008788952
Mean714.5126529
MAD3104.705182
Skewness-0.899935895
Sum676514870
Variance15360963.04
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[-32265. -27037.5 -20615. -19523. -19305.5 ... 8088.5 8092.5 8137.5 8144.5 8193. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-5437 1040 0.1%
 
249 770 0.1%
 
-956 760 0.1%
 
1022 750 0.1%
 
1086 740 0.1%
 
30 720 0.1%
 
1623 670 0.1%
 
3340 600 0.1%
 
2634 580 0.1%
 
3330 550 0.1%
 
Other values (15776) 939640 99.2%
 

Minimum 5 values

ValueCountFrequency (%) 
-32265 10 < 0.1%
 
-30554 50 < 0.1%
 
-23521 10 < 0.1%
 
-21927 10 < 0.1%
 
-21705 10 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
8193 10 < 0.1%
 
8148 10 < 0.1%
 
8141 20 < 0.1%
 
8138 10 < 0.1%
 
8137 10 < 0.1%
 

field4
Numeric

Distinct count38
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean13.98411525
Minimum6
Maximum46
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum6
5-th percentile6
Q18
Median12
Q319
95-th percentile25
Maximum46
Range40
Interquartile range11

Descriptive statistics

Standard deviation6.516819922
Coef of variation0.4660158906
Kurtosis-0.6524030389
Mean13.98411525
MAD5.752905436
Skewness0.6164478364
Sum13240440
Variance42.46894189
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=38)
Histogram
Histogram with variable size bins (bins=[ 6. 6.5 7.5 8.5 9.5 ... 37.5 38.5 39.5 40.5 46. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
9 127010 13.4%
 
8 99280 10.5%
 
7 79670 8.4%
 
10 65140 6.9%
 
6 61340 6.5%
 
18 44100 4.7%
 
20 43110 4.6%
 
21 40140 4.2%
 
19 40090 4.2%
 
11 38840 4.1%
 
Other values (28) 308100 32.5%
 

Minimum 5 values

ValueCountFrequency (%) 
6 61340 6.5%
 
7 79670 8.4%
 
8 99280 10.5%
 
9 127010 13.4%
 
10 65140 6.9%
 

Maximum 5 values

ValueCountFrequency (%) 
46 10 < 0.1%
 
42 10 < 0.1%
 
41 10 < 0.1%
 
40 30 < 0.1%
 
39 70 < 0.1%
 

field5
Numeric

Distinct count26
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1.375689149
Minimum0
Maximum26
Zeros (%)61.9%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q10
Median0
Q32
95-th percentile9
Maximum26
Range26
Interquartile range2

Descriptive statistics

Standard deviation2.423929687
Coef of variation1.761974854
Kurtosis6.183346695
Mean1.375689149
MAD1.761011311
Skewness2.291718656
Sum1302530
Variance5.875435128
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=26)
Histogram
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 21.5 22.5 23.5 24.5 26. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 586120 61.9%
 
2 95220 10.1%
 
1 72830 7.7%
 
4 67500 7.1%
 
9 52460 5.5%
 
3 48880 5.2%
 
5 11270 1.2%
 
6 7540 0.8%
 
8 1700 0.2%
 
7 1680 0.2%
 
Other values (16) 1620 0.2%
 

Minimum 5 values

ValueCountFrequency (%) 
0 586120 61.9%
 
1 72830 7.7%
 
2 95220 10.1%
 
3 48880 5.2%
 
4 67500 7.1%
 

Maximum 5 values

ValueCountFrequency (%) 
26 20 < 0.1%
 
25 50 < 0.1%
 
24 10 < 0.1%
 
23 130 < 0.1%
 
22 30 < 0.1%
 

flag1
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
517900
0
428920
ValueCountFrequency (%) 
1 517900 54.7%
 
0 428920 45.3%
 

flag2
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
512350
0
434470
ValueCountFrequency (%) 
1 512350 54.1%
 
0 434470 45.9%
 

flag3
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
566970
1
379850
ValueCountFrequency (%) 
0 566970 59.9%
 
1 379850 40.1%
 

flag4
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
927990
1
 
18830
ValueCountFrequency (%) 
0 927990 98.0%
 
1 18830 2.0%
 

flag5
Numeric

Distinct count36
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean6.176675609
Minimum0
Maximum3278
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1
Q11
Median1
Q31
95-th percentile2
Maximum3278
Range3278
Interquartile range0

Descriptive statistics

Standard deviation102.9769538
Coef of variation16.67190578
Kurtosis714.8784431
Mean6.176675609
MAD9.72390082
Skewness25.44446999
Sum5848200
Variance10604.253
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=36)
Histogram
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-01 1.5000e+00 2.5000e+00 3.5000e+00 ... 1.2095e+03 1.5170e+03 1.6215e+03 2.4605e+03 3.2780e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 724150 76.5%
 
2 176920 18.7%
 
3 30750 3.2%
 
4 6770 0.7%
 
5 1960 0.2%
 
6 990 0.1%
 
1600 950 0.1%
 
7 760 0.1%
 
3278 590 0.1%
 
364 480 0.1%
 
Other values (26) 2500 0.3%
 

Minimum 5 values

ValueCountFrequency (%) 
0 140 < 0.1%
 
1 724150 76.5%
 
2 176920 18.7%
 
3 30750 3.2%
 
4 6770 0.7%
 

Maximum 5 values

ValueCountFrequency (%) 
3278 590 0.1%
 
1643 280 < 0.1%
 
1600 950 0.1%
 
1434 110 < 0.1%
 
985 20 < 0.1%
 

fraud
Boolean

Distinct count3
Unique (%)< 0.1%
Missing (%)< 0.1%
Missing (n)8
0
926862
1
 
19950
(Missing)
 
8
ValueCountFrequency (%) 
0 926862 97.9%
 
1 19950 2.1%
 
(Missing) 8 < 0.1%
 

hour1
Numeric

Distinct count24
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean13.86472614
Minimum0
Maximum23
Zeros (%)2.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile4
Q110
Median14
Q318
95-th percentile22
Maximum23
Range23
Interquartile range8

Descriptive statistics

Standard deviation5.263208162
Coef of variation0.3796114045
Kurtosis-0.02842532144
Mean13.86472614
MAD4.22002196
Skewness-0.4351781733
Sum13127400
Variance27.70136016
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=24)
Histogram
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 19.5 20.5 21.5 22.5 23. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
13 70560 7.5%
 
14 68250 7.2%
 
12 67380 7.1%
 
11 66920 7.1%
 
15 64780 6.8%
 
10 60840 6.4%
 
16 60200 6.4%
 
17 56630 6.0%
 
18 51940 5.5%
 
9 50320 5.3%
 
Other values (14) 329000 34.7%
 

Minimum 5 values

ValueCountFrequency (%) 
0 20020 2.1%
 
1 12750 1.3%
 
2 7490 0.8%
 
3 5280 0.6%
 
4 4750 0.5%
 

Maximum 5 values

ValueCountFrequency (%) 
23 28160 3.0%
 
22 35160 3.7%
 
21 42350 4.5%
 
20 47940 5.1%
 
19 49570 5.2%
 

hour2
Highly correlated

This variable is highly correlated with hour1 and should be ignored for analysis

Correlation0.9947080158

indicator1
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
839540
1
 
107280
ValueCountFrequency (%) 
0 839540 88.7%
 
1 107280 11.3%
 

indicator2
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
929430
1
 
17390
ValueCountFrequency (%) 
0 929430 98.2%
 
1 17390 1.8%
 

state1
Categorical

Distinct count53
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
CA
186760
FL
 
84360
TX
 
66200
Other values (50)
609500
ValueCountFrequency (%) 
CA 186760 19.7%
 
FL 84360 8.9%
 
TX 66200 7.0%
 
NY 57950 6.1%
 
GA 44790 4.7%
 
VA 37030 3.9%
 
IL 36040 3.8%
 
AZ 32360 3.4%
 
MD 26710 2.8%
 
NJ 26600 2.8%
 
Other values (43) 348020 36.8%
 
Max length2
Mean length2
Min length2
Contains charsTrue
Contains digitsFalse
Contains spacesFalse
Contains non-wordsFalse

total
Highly correlated

This variable is highly correlated with amount and should be ignored for analysis

Correlation0.9994216775

zip1
Numeric

Distinct count899
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean543.2437422
Minimum2
Maximum999
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum2
5-th percentile76
Q1282
Median530
Q3891
95-th percentile956
Maximum999
Range997
Interquartile range609

Descriptive statistics

Standard deviation315.3723025
Coef of variation0.580535546
Kurtosis-1.508401046
Mean543.2437422
MAD288.6651076
Skewness-0.0288278305
Sum514354040
Variance99459.68918
Memory size7.2 MiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[ 2. 4. 6.5 8.5 9.5 ... 993.5 994.5 995.5 997.5 999. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
300 16180 1.7%
 
945 15800 1.7%
 
852 12290 1.3%
 
926 12010 1.3%
 
606 11480 1.2%
 
750 11020 1.2%
 
900 10690 1.1%
 
100 9840 1.0%
 
331 9820 1.0%
 
921 9660 1.0%
 
Other values (889) 828030 87.5%
 

Minimum 5 values

ValueCountFrequency (%) 
2 30 < 0.1%
 
6 110 < 0.1%
 
7 90 < 0.1%
 
8 100 < 0.1%
 
9 260 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
999 90 < 0.1%
 
998 230 < 0.1%
 
997 550 0.1%
 
996 500 0.1%
 
995 1350 0.1%
 

Correlations

Missing values

Sample

First rows

amountdf_indexdomain1field1field2field3field4field5flag1flag2flag3flag4flag5fraudhour1hour2indicator1indicator2state1totalzip1
025.900BELLSOUTH.NET31387880101010.00000FL25.90331
138.851COMCAST.NET21-6330211011010.00000TX38.85750
238.852HOTMAIL.COM205183191000010.01100VA38.85222
324.953GMAIL.COM003822160000010.01100CA24.95946
420.724LEVEL3.COM30353681111010.01100CO20.72805
512.955AOL.COM216110172000010.03310AZ12.95857
610.366AOL.COM20779101111120.05500NY10.36112
710.367HOTMAIL.COM20-543580111120.05500PA10.36152
810.368AOL.COM20585694111040.06600GA10.36301
938.859SBCGLOBAL.NET21542270110010.07700CA38.85956

Last rows

amountdf_indexdomain1field1field2field3field4field5flag1flag2flag3flag4flag5fraudhour1hour2indicator1indicator2state1totalzip1
94681039.9685204COX.NET01-2284170011010.0141400CA39.96926
94681139.9685205COX.NET01-2284170011010.0141400CA39.96926
94681212.9585206HOTMAIL.COM305229110100011.0161610KS12.95670
94681338.8585207AOL.COM31-131782011041.0202000AZ38.85850
94681412.9585208AOL.COM31-131782011040.0202000AZ12.95850
94681538.8585209AOL.COM31-582162011030.0202000MI38.85481
94681612.9585210HOTMAIL.COM2018299111011.0161600GA12.95300
94681738.8585211AOL.COM21-291482011031.05500CA38.85906
94681838.8585212AOL.COM20153072011011.06600NY38.85105
94681938.8585213AOL.COM20-6768201104NaN191900AZ38.85857